System and method for dynamic compression of data

ABSTRACT

A method of transferring a compressed web page over a computer network without affecting the existing web server applications and processes. The compressor intercepts a request from a workstation for the web page. A second request is transmitted to the server from the compressor for the original, uncompressed web page. The web page is selectively compressed in the compressor. Then the compressed web page is transmitted to the workstation. Optionally, some of the files associated web page, such as image files, are also compressed and the references to the compressed associated files is changed to reflect any change in the name of the compressed associated files.

BACKGROUND

[0001] The present invention relates to a method of compressing data fortransmission.

[0002] The Internet is a publicly accessible worldwide network whichprimarily uses the Transport Control Protocol and Internet Protocol(“TCP/IP”) to permit the exchange of information. The Internet supportsseveral applications protocols including the Hypertext Transfer Protocol(“HTTP”) for facilitating the exchange of HTML/World Wide Web (“WWW”)content, File Transfer Protocol (“FTP”) for the exchange of data files,electronic mail exchange protocols, Telnet for remote computer accessand Usenet for the collaborative sharing and distribution ofinformation.

[0003] Several compression techniques have been used to reduce the timerequired to transfer files. Compression can occur at the file or the bitstream level. Applications, such as PKZP™, compress files on a computer.Modems use bit stream level compression techniques to optimizethroughput. Microsoft's Windows NT™ servers include IIS that can providestatic web files in the deflate format. Some web browsers (“browsers”)support the “deflate” format.

[0004] Traditionally, a workstation accesses a web page across theInternet by transmitting a request for the web page to a web server. Theweb server then processes the request and transmits the web page to theworkstation. The web page is a file having hypertext markup language(“HTML”) codes. Once the workstation receives the web page, theworkstation analyzes the HTML codes for references to associated files,such as graphic files, video files, audio files and other files. Theworkstation then sends a second request to the web server for theassociated files. The web server then sends the associated files to theworkstation. This network congestion management mechanism is called“HTTP slow start.”

[0005] Since the associated files are requested only after the initialweb page file has been received and analyzed at the workstation, thespeed of receiving the initial web page file is a controlling factor inthe over all speed of viewing a web page. In order to utilizecompression techniques as a speed enhancement, conventional systemsrequire the web server to store separate pre-compressed web pages.Others also require the web server to send additional code to decode theweb page.

BRIEF SUMMARY

[0006] A method of transferring a compressed web page over a computernetwork without affecting the existing web server applications andprocesses. The compressor intercepts a request from a workstation forthe web page. A second request is transmitted to the server from thecompressor for the original, uncompressed web page. The web page isselectively compressed in the compressor. Then the compressed web pageis transmitted to the workstation.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] The present invention is described with reference to theaccompanying figures. In the figures, like reference numbers indicateidentical or functionally similar elements. Additionally, the left-mostdigit(s) of a reference number identifies the figure in which thereference number first appears.

[0008]FIG. 1 is an illustration of an embodiment of the presentinvention including dynamic compression at the web server;

[0009]FIG. 2 is an illustration of an embodiment of the presentinvention including a data center near the web server;

[0010]FIG. 3 is an illustration of an embodiment of the presentinvention including a data center near the workstation;

[0011]FIG. 4 is a flow chart of an embodiment of the present inventionincluding dynamic compression near a web server;

[0012]FIG. 5 is an illustration of first embodiment of the presentinvention including dynamic compression in a data center.

DEFINITIONS

[0013] A proxy server with cache is a server that is located between aclient application, such as a web browser, and a real server where theproxy is at the client's side of the network. The proxy serverintercepts content requests to the real server to see if it can fulfillthe requests itself out of its cache storage. If not, or if the proxyserver does not have a cache, it substitutes its own IP address for thatof the originating client, makes a notation to associate the returnedresponse to the client, and forwards the request onto the real server.

[0014] A reverse proxy server is a server that is located between aclient application, such as a web browser, and a real server at theserver's side of the network. Optionally, the reverse proxy server has acache.

[0015] A forward proxy server sits between a workstation and theInternet to ensure security, administrative control and optionallyprovide caching services. A forward proxy server can be associated witha gateway server which separates the workstation's local network fromthe Internet or other network. The forward proxy server can also beassociated with a firewall server which protects the local network fromoutside intrusion. The forward proxy server receives content requestsfrom workstations requesting web pages and web page content from the webserver. The forward proxy server then transmits a request for the webpage or content to the web server on the behalf of the workstation. Theforward proxy server modifies the identity of the requestor to be thatof the forward proxy server. This is typically achieved by altering theinternet protocol address of the requester. A forward proxy server canalso be a cache server.

[0016] Portable Network Graphics (PNG) format is a new bit-mappedgraphics format similar to the GIF format. A graphics file stored in thePNG format is more compressed than the same file stored in the GIFformat. The conversion from GIF to PNG format is a lossless conversion.Web browsers such as Netscape Navigator™ and Microsoft InternetExplorerm support PNG.

[0017] Internet Information Server (“IIS”) is Microsoft's Web serverthat runs on Windows NT™ platforms. IIS supports the open-sourcecompression algorithm called deflate.

[0018] Huffman encoding is a prefix coding prepared by a specialalgorithm. Each code is a series of bits, either 0 or 1 that representsan element in a specific “alphabet”, such as the set of ASCIIcharacters. This is the primary but not the only use of Huffman codingin the deflate compression technique.

[0019] A Huffman algorithm starts by assembling the elements of thealphabet, each one being assigned a weight, i.e., a number thatrepresents its relative frequency within the data to be compressed.These weights may be guessed at beforehand, or they may be measuredexactly by examining the data to be compressed, or some combinationthereof. The elements are selected two at a time, the elements with thelowest weights being chosen first. The two chosen elements are made tobe leaf nodes of a node with two branches. For example, a set ofelements and weights can looked like this: A 16 B 32 C 32 D 8 E 8

[0020] Lempel-Ziv (LZ) encoding including LZ77, LZ78, LZW and others aredictionary based substitution compression techniques. LZ77 compressionworks by finding sequences of data that are repeated. LZ77 compressionuses a “sliding window” that is, at any given point in the data, thereis a record of what characters went before.

[0021] Deflate compression can be accomplished with a variety oftechniques as defined in the deflate specification, RFC 1951, hereinincorporated by reference. The three modes of compression available tothe compressor include:

[0022] First, no compression. This mode is selected when the dataanalyzed has already been compressed.

[0023] Second, compression with LZ77 and then with standard Huffmancoding. The trees that are used to compress in this mode are defined bythe Deflate specification itself and preloaded in any deflate decodingcapable software, and so no extra space needs to be taken to store thosetrees or send them to the receiver of the compressed file.

[0024] Third, compression with LZ77 and then with Huffman coding withtrees that the compressor creates by examination of the file and storesalong with the data.

[0025] The data is broken up in “blocks,” and each block uses a singlemode of compression. If the compressor wants to switch fromnon-compressed storage to compression with the trees defined by thespecification, or to compression with specified Huffman trees, or tocompression with a different pair of Huffinan trees, the current blockis ended and a new one begun.

[0026] HyperText Markup Language (“HTML”) is the authoring language usedto create documents on the World Wide Web (Web). HTML defines thestructure and layout of a Web document by using a variety of tags andattributes. The correct structure for an HTML document starts with<HTML><HEAD> “text describing the document”</HEAD><BODY> and ends with</BODY></HTML>. The information included in the Web page is locatedbetween the <BODY> and </BODY> tags.

[0027] An example of an HTML reference to a image file called “cpu.gif”is:

[0028] <img border=“0” src=“cpu.gif” width=“79” height=“75”>

[0029] There are numerous other HTML tags used to format and layout theinformation in a Web page. For example <P> is used to begin paragraphsand <I> and </I> are used to italicize text. Tags are also used tospecify hypertext links.

[0030] Internet Server API (“ISAPI”) is an application programminginterface (“API”) for Microsoft's Internet Information Server (“IIS”)web server. ISAPI enables programmers to develop Web-based applicationsthat run much faster than conventional CGI programs. ISAPI can supportrequests from multiple workstations with only a single instance runningon the server. CGI requires a separate instance for each request.

[0031] ZLIB is lossless data compression technique that uses the deflatetechnique to compress only the body of web pages and not the headers.

DETAILED DESCRIPTION

[0032] Transfer speed of files over the Internet is a critical factor inthe usability of the Internet. Many techniques currently exist toincrease the speed of file transfers. The present invention interceptsweb page requests then compresses the web page, which is usually an HTMLfile, and sends it to the requesting workstation in the compressedformat. The requesting workstation then decompresses the web page beforeprocessing the web page. Optionally, the tags in the web page that pointto image files, for example GIF files, are modified to point to modifiedimage files of a different name, for example PNG files. This process canoccur at an ISP or other form of point of presence to the Internet, at aforward proxy server, at a reverse proxy server, at a transparent proxyserver or at the web server.

[0033]FIG. 1 is an illustration of an embodiment of the presentinvention including dynamic compression at the web server, i.e. as thecontent is served or transferred in response to the request, it iscompressed. The system 100 includes a workstation 102, connected with anetwork 104 connected with a web server 106 that includes a compressor108. The workstation 102 can be a personal computer, a network enabledmobile device such as a cellular telephone, a portable computer, a WebTVlike device or other Internet appliance, a personal digital assistantsuch as a Palm Pilot™ manufactured by 3-Com Corporation, or other devicethat can communicate with a server. The network 104 can be the Internet,a private network, an intranet, or other network including a combinationof networks. The web server 106 can be any computer that stores orretrieves web pages or web files for others over a network. Thecompressor 108 can be integrated with the web server 106 or a standalone unit. For example, the compressor 108 can be a PCI adapter card inthe web server 106 or it can be a separate computer connected with theweb server 106.

[0034] In an embodiment illustrated in FIG. 1, a user at the workstation102 sends a request to the web server 106 over the network 104 for a webpage. The user may be running a web browser such as Internet Explorer™or Netscape Navigator™. When the request reaches the web server 106, thecompressor 108 intercepts the request and determines if the workstation102 can process compressed web pages. Various mechanisms can be used bythe compressor 108 to make this determination. In one embodiment, thecompressor 108 analyzes the request to determine if the request's headercontains a indication that the workstation can handle compressed files.For example, the header can contain a “request encoding =deflate”indication.

[0035] If the workstation can handle compressed files, then thecompressor 108 compresses the web page returned by the web server 106before the web page is transmitted to the workstation 102.

[0036] It is preferred that the web server 106 operations not beaffected by the compressor 108. That is, the web server applications arenot affected and the compressor 108 performs the compressiontransparently to the web server 106, e.g., neither data throughput nordata processing power is impacted.

[0037] Another embodiment of the compressor 108 utilizes a novel mode ofdeflate compression using LZ77 in combination with Huffman coding. TheHuffman coding utilizes trees that are predefined based on an analysisof HTML codes. Since some HTML code must exist in every HTML page andothers occur at frequencies that can be predicted, the Huffman tree canbe generated before the compression to save time compressing the data.For example: HTML Code Huffman Code <html> 0 </html> 1 <head> 00 </head>01 <title> 000 </title> 001 <b> 10 </b> 11 www. 110 <img src= 1110 <ahref= 1111

[0038] While the above is a simplified example, it illustrates how themost commonly occurring characters strings can be replaced by apredetermined Huffman tree that is optimized for HTML code.

[0039] In the embodiment shown in FIG. 1, when the compressor 108 isintegrated with the web server 106, the compressor 108 can be a softwareapplication, a hardware implementation, or a combination of hardware andsoftware. In an embodiment with a hardware compressor 108, thecompressor 108 can be an adapter card in the web server 106 or a bladein a router, i.e. a rack mount printed circuit board or other type ofadapter inserted in the backplane or motherboard or otherwise coupledwith the router (not shown). In another embodiment, the compressor 108is a box connected with the web server 106 via a SCSI bus, universalserial bus, parallel port, serial port, fiber channel, fast Ethernet, orother communication channel. In this embodiment, the web server 106receives a request for a web page and an indication that the workstation102 can handle web pages in the deflate format. The web server 106processes the request similar to traditional web page requests, whichcan include using Microsoft's IIS and ISAPI. However, before the webpage is transmitted over the network 104, it is sent to the compressor108 for processing. The compressor 108 compresses the web page andoptionally converts references to associated files such as image filesin the web page to match a second naming convention. The web page isthen transmitted over the network 104.

[0040] It is preferred that when the compressor 108 is called by the webserver's API, two pointer addresses are provided. The first is theaddress of the data for the client and the second is the address toplace the compressed data. Further it is preferred that the compressor108 perform the encoding at a rate of at least 200 Mbps. The hash tablesused for encoding are stored on the compressor 108 in a fast media suchas flash memory or other integrated storage device. It is preferred thatthe hash tables be software programmable.

[0041] The ISAPI filter receives the data from the client, passes it tothe compressor 108, and then sends the encoded data to the client.

[0042] It is preferred that the compressor 108 is a PCI interfaceadapter that supports Windows NT, Linux, OpenBSD, and Sun Solarisoperating systems and supports Microsoft's IIS versions 4.0 and 5.0,Apache, Iplanet, and Lotus' Domino web server software. It is furtherpreferred that the compressor 108 be a server co-processor to reduce theload on the web server's processing capacity and can process wire speedscompression of up to 45 Mbps and compresses web pages by a ratio of 40to 1.

[0043] The workstation 102 can indicate that it wishes to receive webpages in a compressed format in a variety of ways. First, theworkstation 102 can include in the web page request an indication flagthat indicates that web page compression is acceptable. The same flag oran additional flag can be used to indicate that the associated files,for example GIF files, can be compressed. Second, the workstation 102can send a request to the compressor 108 or data center 208 (FIG. 2)indicating that compression is acceptable for that workstation 102.Other means of indicating that the workstation 102 can accept compressedfiles can also be used. While the terms compressor and data center havebeen used to describe devices that are located near the web server andbetween the web server and the workstation, those terms are not limitedto the embodiments used to illustrate their functionality.

[0044]FIG. 2 is an illustration of an embodiment of the presentinvention including a data center 208 near the web server 206 in areverse proxy server configuration. The system 200 includes aworkstation 202 connected with a network 204 connected with a datacenter 208 connected with the web server 206. The data center 208intercepts web pages and compresses them before they are transmittedover the network 204 to the workstation 202. The data center 208 can beconnected with the web server 206 via a network, a direct connection, orother means. It is preferred that the data center 208 is directlyconnected with the web server 206. In one embodiment, the data center208 can used to compress all appropriate web pages and associated filesfrom the web server.

[0045]FIG. 3 is an illustration of an embodiment of the presentinvention including a data center 308 near the workstation 302 in aproxy server configuration. The system 300 includes a workstation 302connected with a data center 308 connected with a network 304 connectedwith the web server 306. The data center 308 intercepts web pages fromthe network 204 and compresses them before they are transmitted to theworkstation 302. It is preferred that workstation 302 is separated fromthe data center 308 by a network link. The network link can be a dial-upconnection, dedicated connection, an intranet, or a private network.

[0046] In this embodiment, the data center 308 can be located at anInternet service provider (ISP) site to compress the web pages andassociated files that are sent over the dial-up link to the workstation302. Alternatively, the data center 308 could be located on a company'sintranet network to compress all web pages routed from the Internet toworkstations on the intranet.

[0047] In one embodiment, the data center 308 compresses all web pagesfor a particular workstation or set of workstations. The data center 308builds a list of workstations 302 that wish to accept compressed webpages, then all web pages sent to those workstations 302 are compressed.

[0048]FIG. 4 is a flow chart of an embodiment of the present inventionincluding dynamic compression near a web server. The method ofcompressing data for efficient transfer includes the following steps:

[0049] In 402, a request for a web page is received at a data center orcompressor. This request can be received at the ISP or otherpoint-of-presence, across a network, or at the web server. A data center208 (FIG. 2) or a compressor 108 (FIG. 1) can receive the web pages.Alternatively, another computer can intercept the web pages and transmitthem to the data center 208 of compressor 108.

[0050] In 404, a second request is transmitted to a web server havingthe web page. The web page is received in response to the secondrequest.

[0051] In 406, the request is analyzed to determine if the desired webpage can be compressed. Other methods of determining if the web page canbe compressed can also be used. For example, a default setting can beestablished for a workstation that is known to be capable of receivingcompressed data or the workstation can send a separate message to thecompressor indicating that the web pages should be compressed. Therequest can be analyzed any time after the request is received andbefore the web page is selectively compressed. Optionally, the requestcan be analyzed to determine if the web page's associated files can alsobe compressed.

[0052] In 408, the web page is selectively compressed. Only web pagesthat are to be sent to workstations that are known to be able to handlecompressed files are compressed.

[0053] Optionally, references in the web page to associated files thatwill be compressed are altered to reflect any change such as theassociated file's name, extension, or other code. Alternatively, therequest for the associated files can be analyzed to determine if theassociated files can be compressed. The determination can be based onthe workstation's ability to handle compressed files. Since the web pagewas compressed, the associated files sent to the same workstationusually can also be compressed. Thus, the determination can be based onthe request for the associated files, for example a flag can be set inthe request, or the compressor can keep track of the workstation thatreceived the web page and automatically compress the associated files.

[0054] In 410, the selectively compressed web page is transmitted towardthe workstation. Depending on where the compression occurs, thecompressed web page is transmitted toward the workstation over theInternet, over a dial-up line, over a dedicated line, over an intranet,or over some other connection.

[0055] In 412, a third request is received that requests the web page'sassociated files.

[0056] In 414, a fourth request is transmitted to the web server for theassociated files.

[0057] In 416, the associated files are selectively compressed. Forexample, graphic files in the GIF format can be converted to the PNGformat using a lossless transformation.

[0058] In 418, the associated files are transmitted toward theworkstation.

[0059]FIG. 5 is an illustration of an embodiment of the presentinvention including a data center. The method 500 of compressing filesfor transfer includes the following:

[0060] In 502, a request is received at a data center or compressor froma remote web browser requesting a web page from a web server. Forexample, the request can be received at an ISP, across the network, orat the web server.

[0061] In 504, the request is analyzed to determine if the web page canbe compressed and optionally if the web page's associated files can becompressed.

[0062] In 506, the data center sends a request to the web server for theweb page. Alternatively, the data center can retrieve the web pagedirectly.

[0063] In 508, the web page is received, selectively compressed, andreferences in the web page to associated files are selectively changedat the data center.

[0064] In 510, the selectively compressed and selectively modified webpage is transmitted toward the workstation.

[0065] In 512, a request for the files associated with the web page isreceived at the data center from the web browser.

[0066] In 514, the associated files are requested from the web server.The associated files can be a single file or a plurality of files.

[0067] In 516, the data center receives the associated files andselectively compresses and selectively renames the file. For example,the graphics file “test.gif” in the GIF format can be deflated andrenamed “test.png” in the PNG format.

[0068] In 518, the associated files are transmitted to the workstation.

[0069] While preferred embodiments have been shown and described, itwill be understood that they are not intended to limit the disclosure,but rather they are intended to cover all modifications and alternativemethods and apparatuses falling within the spirit and scope of theinvention as defined in the appended claims or their equivalents.

What is claimed is:
 1. A method of transferring data, comprising: (a)dynamically compressing a web page based on a first request from aworkstation, wherein the first request indicates the web page; and (b)transmitting a first response that comprises the dynamically compressedweb page from a compressor toward the workstation.
 2. The method ofclaim 1, wherein (a) further comprises: determining if the first requestcomprises an indication that the workstation is capable of decompressingthe web page; and selectively compressing the web page based on theworkstation's capability to decompress the web page.
 3. The method ofclaim 2, wherein (a) further comprises: receiving the first request froma browser on the workstation over a computer network.
 4. The method ofclaim 3, wherein the transmitting further comprises: transmitting thefirst response that comprises a web page.
 5. The method of claim 4,wherein (a) further comprises: dynamically compressing the web pagebased on a compression indicator in the first request.
 6. The method ofclaim 4, wherein (b) further comprises: transmitting a compressionindicator in the first response that indicates the web page is in acompressed format.
 7. The method of claim 6, wherein (a) furthercomprises: compressing the web page into a deflate compressed dataformat.
 8. The method of claim 7, wherein (a) further comprises:retrieving the web page from a web server.
 9. The method of claim 8,further comprising: (c) receiving a second request from the workstationindicating an associated file; and (d) transmitting a second responsefrom the compressor to the workstation in response to a second request;wherein the second response comprises at least the associated file. 10.The method of claim 9, wherein (d) further comprises: transmitting aplurality of files associated with the web page.
 11. The method of claim10, wherein (d) comprises: transmitting a set of graphic, audio, andvideo files associated with the web page.
 12. The method of claim 9,wherein (d) further comprises: transmitting the associated file in acompressed format.
 13. The method of claim 9, wherein (d) furthercomprises: selectively transmitting the associated file in a compressedformat based on an indication in the second request that the workstationis capable of processing the associated file in a compressed format. 14.The method of claim 14, wherein (d) comprises: selectively transmittingthe associated file in a PNG format.
 15. A method of transferring dataover a computer network, comprising: (a) receiving a first request froma workstation, the first request indicating a web page and a web serverassociated with the web page; (b) transmitting a second request to theweb server, the second request requesting the web page; (c) receivingthe web page in a first format from the web server; (d) selectivelycompressing the web page to a second format, after the receiving of theweb page; and (e) transmitting the web page to the workstation in thesecond format.
 16. The method of claim 15, wherein (a) comprises:receiving the first request over the computer network. 17 The method ofclaim 16, wherein (a) comprises: receiving the first request at areverse proxy server. 18 The method of claim 16, wherein (d) comprises:selectively compressing the web page at the web server.
 19. The methodof claim 16, wherein (b) comprises: transmitting the second request overa computer network.
 20. The method of claim 19, wherein (a) comprises:receiving the first request at a forward proxy server.
 21. The method ofclaim 19, wherein (d) comprises: selectively compressing the web page ata point of presence of the computer network.
 22. The method of claim 15,wherein (d) comprises: selectively compressing the web page using adeflate process.
 23. The method of claim 15, wherein (d) comprises:modifying a reference in the web page, wherein the reference refers toan associated file.
 24. The method of claim 23, further comprises: (f)receiving a second request from the workstation, the second requestindicating at least the associated file; (g) transmitting a thirdrequest to the web server for the associated file; (h) receiving theassociated file in a third format from the web server; (i) selectivelycompressing the associated file; and (j) transmitting the associatedfile to the workstation in a fourth format, the fourth format being morecompressed than the third format.
 25. The method of claim 24, wherein(i) comprises: selectively compressing the associated file using thedeflate format.
 26. The method of claim 23, further comprises: (f)transmitting a third request to the web server for the associated filebefore receiving a second request from the workstation indicating thefile; (g) receiving the associated file in a third format from the webserver; (h) selectively compressing the associated file; and (i)transmitting the file to the workstation in a fourth format, the fourthformat being more compressed than the third format.
 27. A system fortransferring data over a computer network, the system comprising: aworkstation coupled with the computer network and operative to request aweb page from a web server coupled with the computer network; a proxyserver coupled with the computer network and operative to receive therequest, transmit a proxied request to the web server and receive theweb page from the web server in a first format; and a compressor coupledwith the proxy server and operative to selectively compress the web pageto a second format; and wherein the proxy server is further operative totransmit the web page to the workstation in the second format.
 28. Anapparatus for transferring data over a computer network, the apparatuscomprising: a request receiver coupled with the computer network andoperative to receive a first request from a workstation coupled with thecomputer network, the request indicating a web page and a web serverassociated with the web page; a request transmitter coupled with therequest receiver and operative to transmit a second request to the webserver; the second request comprising a request for the web page; a webpage receiver coupled with the computer network and operative to receivethe web page in a first format from the web server; a compressor coupledwith the web page receiver and operative to selectively compress thereceived web page to a second format; and a web page transmitter coupledwith the compressor and operative to transmit the web page to theworkstation in the second format.